163
DOI: 10.1201/9781003355205-5
C h a p t e r 5
RNA-Seq Data Analysis
5.1 INTRODUCTION TO RNA-SEQ
RNA sequencing or shortly RNA-Seq is a high-throughput sequencing technique to
examine gene expression or profiling in biological samples. Rather than the whole genome,
RNA-Seq focuses only on transcriptomes to tell us which one of the genes in the samples
are turned on or off and to what extent. The transcriptome is the set of all messenger RNA
transcribed from genes in cells. Genes represent a small portion of the whole genome of
an organism. For instance, genes are only around 3% of the human genome. RNA, in gen-
eral, can either be translated into proteins or play functional role such as rRNA for protein
synthesis, tRNA for amino acid transfer, and microRNA that plays a role in gene regula-
tion. Out of whole transcriptome, protein-coding RNA (mRNA) is around 2%–4%, while
the greatest percentage of RNA molecules is the other types of RNA. The gene expres-
sion studies mainly focus on mRNA only because it is translated into proteins. Proteins
are required for the structure, function, and regulation of the cells forming body’s tissues
and organs. Dysfunctions of proteins implicate in most of the diseases and healthy condi-
tions. By studying the mRNA, we can find out which genes are active in a particular cell
type or under a certain condition, giving us a clue about the function of the cells and the
biological activities under that condition. Moreover, we can compare the gene activities
in different types of cells or in different conditions and study how the patterns of gene
expression change over time or in response to different stimuli. We can use such informa-
tion to understand biological activities, normal functions of the cells, effects of pathogens
or therapy, or response to any factors. In short, examining mRNA of cells under a specific
condition provides a snapshot or a checkpoint of the cellular activities at that moment.
Therefore, it is mostly studied to investigate how a certain disease like cancer may affect the
gene expression or to study the cellular response to a treatment or a condition.
A gene is a DNA sequence that consists of several functional components; each plays
a different role in the process of gene transcription into RNA. In general, the functional
gene regions can be divided into two main units: the promoter region and the coding
region. The promoter region controls the transcription of the mRNA, whereas coding